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Abstract. - Generalized preferential attachment is defined as the tendency of a vertex to ac- 
quire new links in the future with respect to a particular vertex property. Understanding which 
properties influence link acquisition tendency (LAT) gives us a predictive power to estimate the 
future growth of network and insight about the actual dynamics governing the complex networks. 
In this study, we explore the effect of age and degree on LAT by analyzing data collected from 
a new complex-network growth dataset. We found that LAT and degree of a vertex are linearly 
correlated in accordance with previous studies. Interestingly, the relation between LAT and age of 
a vertex is found to be in conflict with the known models of network growth. We identified three 
different periods in the network's lifetime where the relation between age and LAT is strongly 
positive, almost stationary and negative correspondingly. 
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' Introduction. — One of the most profound discov- 
eries in complex-network studies was realizing that the 
structure and dynamics of many real-world networks do 
not follow a completely random but rather organized be- 
havior. The power-law degree distribution observed in 
many complex networks has attracted a considerable at- 
tention because it is a significant deviation from random 
behavior [1,3]. In this study, we focus on the dynamics 
that lead to power-law degree distributions. 

In a dynamic complex network, there is a continuous 
creation of vertices and formation of links between the 
vertices (vertex and link removal can be included in this 
abstraction as well). For many networks, it is natural to 
view this process as a competition between the vertices to 
acquire the newly formed links [4]. The resulting degree 
distribution will be based on the link acquisition tendency 
(LAT) values of individual vertices. It is interesting to ask 
which vertex properties effect link acquisition tendency in 
which ways. 

In this study, we aim to analyze the effect of some basic 
properties such as age and degree of a vertex on the link 
acquisition tendency. Such an analysis requires growth 
data of networks with precise time stamps of the vertex 
and link creations. As Newman [5] states, obtaining such 



dynamic data is difficult and most of the time the time res- 
olution is low. Although a limited number of studies ana- 
lyze the preferential attachment in several networks, their 
focus is on degree related preferential attachment [5-9]. 
Furthermore, many of these reported studies analyze dy- 
namic networks obtained from social collaboration and sci- 
entific citation data. Considering these limitations of the 
previous studies, we decided to use a more generalized 
methodology that will allow us to analyze not only the 
effect of degree on link acquisition but also of other vertex 
properties. Instead of analyzing a previously published 
dataset, we decided to utilize a new dataset with a high 
time resolution that will allow us to analyze the prefer- 
ential attachment in short time scales and comes from a 
previously unexplored domain. 

Methodology. — We assume that the network con- 
tains directed links and a vertex is said to acquire a new 
link if a new link terminating at that vertex is formed. 
We define the generalized preferential attachment as the 
tendency to acquire new links with respect to vertex prop- 
erties [9] . Age and degree of the vertices are the two prop- 
erties we will discuss. It is possible to formalize the notion 
of preferential attachment without referencing a particular 
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vertex property. 

Our analysis aims to measure LAT as a function of the 
vertex properties being investigated. It is based on data 
collected during an interval [to, to + At] of the network's 
lifetime. First, we construct a snapshot graph of the net- 
work at to, record the properties of vertices in that graph, 
and assume they do not change significantly during the 
analysis interval [to, to + At]. We group the vertices having 
the same property values together and calculate the aver- 
age number of new links that each group acquires during 
the interval. The average number of new links as a func- 
tion of the vertex property value is a measure of the effect 
of having a specific property value on the link acquisition 
tendency. It is possible to view this process as calculating 
an histogram. We awsign each vertex to a bin according to 
its property value at to and record the number of new links 
accumulated for each bin during the analysis interval. By 
applying appropriate normalization measures, it is pos- 
sible to formulize this measure as a probability function 
conditioned on the property value, as we will see below. 

Let m be a generic vertex property (e.g. age, de- 
gree, etc.) taking one of the following values M = 
{mi, 1712, ...,mq} for each vertex. P{m = mi) is the prob- 
ability that a vertex has property value m^. This prob- 
ability distribution is shortly represented as P(m). Let 
event L denote the acquisition of a new link by a vertex. 
P{L) is the probability for a particular vertex to acquire 
a new link. By definition, without any a priori informa- 
tion, P{L) — 1/n where n is the number of vertices. The 
conditional probability P(m = m,i[L) is the probability 
of observing a vertex with property value at the ter- 
mination point of a newly formed link. This probability 
distribution function is shortly represented as P{m[L) for 
notational simplicity. Finally, the conditional probability 
P{L[m = mi) is the probability that a particular vertex 
will acquire the next link to be formed given that the prop- 
erty value of the vertex is m,. It is a measure of the effect 
of property m on link acquisition. 

By applying the Bayes formula, we can calculate 
P{L[m = mi) as follows: 



P{L[m = mi) 



P{m = mi\L) ■ P{L) 
P{m = mi) 



(1) 



This value gives us the link acquisition tendency as a 
function of the property m and is a measure of the LAT. 
Unfortunately, it is not possible to calculate it directly 
from the data. But we can calculate estimates of measures 
on the right side of Eq. 1 to estimate P{L[m). Let Al^otai 
be the total number of links acquired by all vertices during 
the interval [to, to + At]. Let Al^. denote the total number 
of links acquired by the vertices with property value m, 
at to, during the interval [to, to + At]. P{m\L) serves as 
estimation for P{m[L). It is calculated as follows: 



P{m = mi\L) = 



AL 



Altotal 



(2) 



If we plug in the empirical estimates of the sample dis- 
tribution of the property m of the vertices at to, P{m), 
and P{m\L) into the right side of Eq. 1, we can obtain 
an estimate value for the link acquisition tendency as a 
function of property m. 

An important point that is worth being noted is the 
assumption that the distribution P(m) does not change 
during the interval [to, to + At]. In reality, as time passes, 
vertex properties such as degree and age change. Previ- 
ous studies acknowledge this problem and propose using 
relatively small At values compared to the lifetime of the 
network [5-9]. In order to avoid the same problem, we 
use small At values and assume that P(m) is stationary 
during [to, to + At]. The effect of different At values will 
also be investigated. 

A closer examination of our methodology reveals impor- 
tant similarities between the proposed LAT measurement 
method and the method adopted in [5]. Both methods 
employ a time- window size parameter which regulates the 
length of the analysis interval and assume that the un- 
derlying preferential attachment mechanism is time inde- 
pendent (at least during the analysis interval). One dif- 
ference is that the final preferential attachment measures 
reported by [5] are relative probability values and it is not 
possible to compare the results of different analyses (either 
in time or for different networks) without carrying out a 
normalization beforehand. The LAT measures reported in 
this study are normalized conditional probability distribu- 
tions and hence they have straightforward interpretations. 
Another difference is that our methodology assumes the 
degree distribution does not change significantly and uses 
the distribution sampled at the beginning of the analysis 
interval {[to, to + At]) for all calculations regarding that 
interval while [5] uses the exact distributions for each in- 
dividual link acquisition incident. Since both methodolo- 
gies already rely on the assumption that preferential at- 
tachment dynamics remain time-independent during the 
analysis, such a simplification in the calculations are quite 
justifiable and rewarding given the easier normalization 
techniques. 

Validation. — Before analyzing the new dataset, we 
would like to test our new method on a synthetic network 
built according to well known network growth model: The 
Barabasi Albert (BA) model [1]. Since we know the exact 
dynamics behind the network growth in BA model, we can 
compare our results to the expected ones and see whether 
our new method correctly captures the dynamics or not. 

The BA network is created by setting the model pa- 
rameters specified in [1] as t = 1,000,000, mo = 10, and 
m = 6. The final network contains approximately one 
million vertices and six million directed links. The first 
900,000 vertices are used to construct the initial network 
and the remaining vertices arc used to calculate the LAT 
measure. Our method correctly captures the linear degree 
based preferential attachment as seen in Fig. 1(a). It is 
also analytically known that in the BA model the relation 
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(a) Degree related LAT of BA model. 
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(b) Age related LAT of BA model. The dashed line shows the 
analytically expected slope. 

Fig. 1: LAT measurements for BA model. 

between the birthday of a vertex and the rate at which 
it increases its degree will be a power law {LAT oc to~°'^ 
where m is the creation time of a vertex) [1] . Our method 
successfully captures the power law relation as shown in 
Fig. 1(b) along with a dashed line which has the analyti- 
cally expected slope. 

Data. — In [17], a new network growth dataset which 

satisfies the aforementioned requirements is already in- 
troduced. The network is constructed by using the data 
crawled from "Ek§i Sozliik" (literal translation from Turk- 
ish is Sour Dictionary) web site [10]. This site, which will 
be called the Dictionary shortly, is technically a collabo- 
rative hypertext dictionary in operation since 15 February 
1999. The Dictionary is a site in which one can find ex- 
planations and definitions of almost any concept one can 
think of. Each concept is represented by a title. Each 
individual definition about a title is called an entry. The 
entries are listed chronologically under the titles and each 
entry has an associated timestamp indicating its time of 
creation. The entries may contain hyper-textual cross- 



references to other (possibly non-existing) titles and they 
have timestamps indicating the date and time they were 

written. 

By using the raw data, we defined and constructed snap- 
shot graphs of the Dictionary. A snapshot graph, Gt, is a 
directed graph where each title a is represented by a ver- 
tex Va and a cross-reference link from title a to title b is 
represented by an arc from vertex Va to vertex vt. Gt is 
constructed by including all vertices and links that were 
created until t. Several cross-references between the same 
titles are represented only once by a single arc in the graph. 
The time resolution of the creation times is one day for the 
first 2 years and one minute for the rest of the data. 

In the end, we obtained a complex-network growth data 
which contains the vertex and link creation events dating 
back to the first day of the network (15 February 1999) 
extending until 01 January 2006. The final network has 
1,921,425 vertices and 6,828,296 directed links. The degree 
distribution of the network follows a power law with the 
following form: P(m) oc m~'^ where 7 = 2.126 [17]. 

LAT as a Function of Degree. — Degree related 
preferential attachment is an important concept closely 

related to the degree distributions in networks. The lin- 
ear preferential attachment hypothesis introduced by the 
Barabasi-Albert (BA) model states that the probability 
of a vertex to acquire new links is linearly proportional 
to its current degree [1,6]. Almost all scale- free network 
models either explicitly incorporate the linear preferen- 
tial attachment hypothesis [1,4,11] or expect it to emerge 
from the interactions between the growth and dynamics 
of the network [12, 13]. There are some studies, which 
provide consistent results showing that there is indeed a 
linear preferential attachment phenomenon in some cer- 
tain complex networks [5-9,16]. 

If we let m represent the degree of a vertex and con- 
struct the set M = {wi, m2, mg} which is the set of 
all possible degree values in the network, then the LAT 
measure we calculate (i.e. P{L\m)) becomes the degree 
related preferential attachment. To measure degree related 
LAT, we constructed an initial network by using the snap- 
shot of the network on 01 December 2005 and analyzed the 
network growth between 01 December 2005 and 31 De- 
cember 2005. (i.e. to is 01 December 2005, At is 31 days). 
The calculated LAT values arc plotted in Fig. 2(a). The 
findings confirm that link acquisition tendency is linearly 
proportional to the vertex degree and the best fitting line 
has the form: LAT = 2.346.10-^771 + c 

We carried out the same analysis by using the method 
proposed by Newman in [5]. Its results are plotted in 
Fig. 2(b). For lower values of the degree, the linearity is 
captured but for higher degree values the linear relation 
between the degree and preferential attachment measure 
disappears. We believe that in reality, the linearity exists 
even for the high degrees to some extent but the method 
fails to capture it. In [5], the bin to which a vertex is as- 
signed shifts to the right for each link acquisition incident 
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Fig. 2: (a) LAT vs. degree calculated by using our newly 
proposed method and (b) preferential attachment measure ob- 
tained by the method adopted in [5], to= 01 December, 2005, 
At = 31 for both cases. 



because the vertex's degree increases every time it acquires 
a new link. This factitiously leads to low preferential at- 
tachment values in the areas where the vertices arc sparse. 
This effect is clearly visible in the circled data points in 
Fig. 2(a) and Fig. 2(b). In both cases the data points 
are obtained for the same vertex. But in our methodol- 
ogy only one data point is produced for the vertex while 
in Newman's method a shifting series of data points are 
produced. 

In order to provide evidence that the observed linear 
dependence is a global property of the network and is not 
a temporary phenomenon specific to the interval 01 - 31 
December 2005, we repeated measurements by using our 
original proposed method for different values of to and 
obtained similar results for every interval we analyzed that 
is the relation between the LAT and degree of a vertex is 
linear independent of the time of the analysis. But the 
slope of the best fitting line changes significantly as time 
passes. Figure 3 presents the slope values of the degree 
related LAT values which exhibit a significant decrease 
even for different months in the same calendar year. 



Fig. 3: Change in the slope of degree related LAT value. 

LAT as a Function of Age. — Age of a vertex is 
defined as the number of days passed since the creation of 

the vertex. It is possible to consider the creation time (i.e. 
birthday) of a vertex in the analyses instead of its age and 
we do so in order to present our graphs in a compatible 
way with the previous studies [1]. In this section, the 
generic property m represents the birthday of a vertex in 
days. 

The age related LAT values calculated from the mea- 
surement done between 01 December 2005 and 31 Decem- 
ber 2005 are given in Fig. 4. The creation times of the 
vertices arc plotted in the x-axis in days with the first day 
corresponding to 15 February 1999. The small interval 
around approximately the 500*'* day corresponds to pe- 
riod where the Dictionary was closed temporarily hence 
no data points exist for that interval. 
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Fig. 4: LAT vs. age, to= 01 December, 2005, At = 31. 

Interestingly, the age related LAT does not follow a sim- 
ple distribution. Instead, we identified three different ver- 
tex subsets according to their age values where the link 
acquisition tendency follows three different distributions. 
The subsets are named as old vertices, middle vertices, and 
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young vertices. The old vertices are the ones which are cre- 
ated approximately before March 2003 and have birthday 
values lower than 1000. For the old vertices, the relation 
between the birthday and link acquisition tendency of a 
vertex is strongly negative. Therefore, the earlier a vertex 
is created the higher LAT value it is expected to have. An 
exponential model provides a very good fit for LAT values 
for this period. The best fitting exponential model for the 
observation has the form LAT oc e~^'^° 

Young vertices are the ones that are created during the 
last 60 days prior to the analysis. Among the young ver- 
tices, the relation between birthday and link acquisition 
tendency is positive which means being younger (i.e. hav- 
ing higher birthday values) pays off in terms of LAT value. 
An exponential model which has the form LAT oc e'^'^^^™ 
is the best fitting exponential model. 

The middle vertices are the ones that are created in 
between the old and young vertices. For the middle ver- 
tices, link acquisition tendency seems to be almost sta- 
tionary with respect to the birthday. The best fitting 
line for the LAT values for this subset has the form 
LAT = —4.583 • 10~^m 4- c and it is almost a constant 
line for all practical purposes. 

We should stress that what we are after is not the precise 
boundaries between these sets of vertices. But the mere 
recognition of three different subsets of vertices according 
to their creation times suggests that the relation between 
link acquisition tendency and the age of a vertex adopts 
qualitatively different characteristics during the life time 
of the network. 

An analysis of variance (ANOVA) with a significance 
level of 0.05 confirmed that there is a main group effect of 
the birthday values: The mean LAT values for the three 
subsets differ significantly with old vertices being the high- 
est, young vertices the second highest and the middle ver- 
tices the lowest. 

In order to asses whether this partitioning of the life- 
time of the Dictionary is valid not only for a single analysis 
but whole life time of the network, we carried out exten- 
sive measurements for different to values and all of them 
yielded similar results. A sample of the age related LAT 
values calculated by analyzing different intervals arc plot- 
ted in Fig. 5. Combining the results of ANOVA and the 
comparative analyses for different intervals, we conclude 
that partitioning the lifetime of the network into three pe- 
riods with the aforementioned limits is globally valid for 
the network. The method adopted in [5] also yields quali- 
tatively similar results which confirm our findings but we 
are not presenting them here due to space limitations. 

In order to analyze the individual characteristics of the 
three subsets of the vertices in more detail, we carried out 
24 different measurements, spanning each calendar month 
starting from 01 January 2004 to 31 December 2005. For 
each interval, we calculated the correlation between the 
age and LAT values of old, middle, and young vertices 
separately. The average correlation between age and LAT 
for the old vertices is almost perfectly positive (r = 0.966) 
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Fig. 5: Age related LAT for different intervals. 

and has a 95% confidence interval (CI) of [0.964, 0.967]. 
Th(> average correlation between age and LAT for the mid- 
dle vertices is practically zero: r = —0.005 with a 95% CI 
of [-0.060, 0.050]. The average correlation between age and 
LAT for the late period is strongly negative (r = —0.528) 
and has a 95% CI of [-0.576, -0.479]. 

The Effect of At. — The only free parameter in the 
proposed method is the length of the analysis window. We 

repeated the LAT calculations for degree and age for dif- 
ferent values of At to asses the importance of this param- 
eter. A representative set of our calculations are plotted 
in Fig. 6 and Fig. 7. As seen in the figures, the degree 
related LAT does not change with respect to differing val- 
ues of At. For the age related LAT value, however, the 
value of At is more important. For longer At, we can not 
observe the increase in the LAT values of the young ver- 
tices. This is understandable because for longer time in- 
tervals our stationary P{m) distribution assumption does 
not hold. The young vertices at the beginning of the anal- 
ysis are no longer young at the end of the analysis when 
At is one year and this interferes with the calculated LAT 
values. 

Conclusions and Future Work. — As in compliance 
with previous studies, the relation between the LAT value 

and degree of a vertex is found to be linear. The more 
connected vertices are more likely to acquire new links in 
the future and this likeliness is a linear function of vertex 
degree. 

The relation between the LAT value and age of a vertex 
is more complicated. For the old vertices that are created 
roughly before the 1500*'' day (which corresponds to some 
time around March 2003) the LAT value is positively cor- 
related with the age. For the vertices that are created af- 
ter March 2003 (i.e. middle vertices), the relation between 
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Fig. 6: Degree related LAT for different At values (to = 01 Jan Fig. 7: Age related LAT for different At values [to = 01 Jan 
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the LAT and age (or creation time) disappears except the 
vertices that are created during last 60 days prior to the 
analysis (i.e. young vertices). For the young vertices being 
younger pays off in terms of LAT. 

In the real life dynamics, age of a vertex certainly does 
not play an explicit role in link acquisition. It is obvious 
that the users are not inclined towards giving cross refer- 
ences to other titles just because they are created early. 
This reasoning suggests that at least one mediator variable 
should be present and effecting LAT through age. The na- 
ture of this (or possibly these) variable(s) calls for future 
research. 
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